From Tags to Topic Maps: Using Marked-up Hebrew Text to Discover Linguistic Patterns

نویسندگان

  • Jan H. Kroeze
  • Theo J.D. Bothma
  • Machdel C. Matthee
چکیده

The paper discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. It identifies XML as a suitable mark-up language to build an exploitable data bank of multi-dimensional data in the Hebrew text of the Old Testament. This concept is illustrated by tagging a transcription of Gen. 1:1-2:3 and manipulating this data bank. Transferring the data into a three-dimensional array allows advanced processing of the data in order to either confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting this process of knowledge creation. The empirical study is a small experiment that illustrates the viability and usefulness of the proposed expert devices as well as the benefits of applying information system techniques to linguistic databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visualizing Mappings of Semantic and Syntactic Functions

This paper investigates the visualization of the mapping of semantic and syntactic functions that were marked up in an XML-database containing linguistic data of the Biblical Hebrew text of Genesis 1:1-2:3. It focuses on twodimensional topic maps as a graphical data-mining utility. The visual information is used to prompt the reconsideration of some existing assumptions and hypotheses about Bib...

متن کامل

Entities as topic labels: Improving topic interpretability and evaluability combining Entity Linking and Labeled LDA

Hurvitz, A. (2013). Late Biblical Hebrew, Khan. Khan, G. (ed.) (2013). Encyclopedia of Hebrew Language and Linguistics, Vol. 4, Leiden, Brill, 2013. Kutscher, E. Y. (1974). The Language and Linguistic Background of the Isaiah Scroll (1QIsaa), STDJ 6. Leiden, Brill. Oosting, R., Dyk, J. and Glanz, O., Valence Patterns of Motion Verbs, Semantics, Syntax and Linguistic Variation, to be published. ...

متن کامل

A review of text mining approaches and their function in discovering and extracting a topic

Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling.  Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...

متن کامل

A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew

This paper presents a comprehensive NLP system by Melingo that has been recently developed for Arabic, based on Morfix – an operational formerly developed highly successful comprehensive Hebrew NLP system. The system discussed includes modules for morphological analysis, context sensitive lemmatization, vocalization, text-to-phoneme conversion, and syntactic-analysis-based prosody (intonation) ...

متن کامل

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017